Complementarity of MFCC, PLP and Gabor features in the presence of speech-intrinsic variabilities
نویسندگان
چکیده
In this study, the effect of speech-intrinsic variabilities such as speaking rate, effort and speaking style on automatic speech recognition (ASR) is investigated. We analyze the influence of such variabilities as well as extrinsic factors (i.e., additive noise) on the most common features in ASR (mel-frequency cepstral coefficients and perceptual linear prediction features) and spectro-temporal Gabor features. MFCCs performed best for clean speech, whereas Gabors were found to be the most robust feature in extrinsic variabilities. Intrinsic variations were found to have a strong impact on error rates. While performance with MFCCs and PLPs was degraded in much the same way, Gabor features exhibit a different sensivity towards these variabilities and are, e.g., well-suited to recognize speech with varying pitch. The results suggest that spectro-temporal and classic features carry complementary information, which could be exploited in feature-stream experiments.
منابع مشابه
Diagnostics of Speech Recognition: On Evaluating Feature Set Performance
In this paper we present an explorative study of diagnostics of speech recognition for finding subsets of features that are most informative in terms of incorrect speech recognition, if variable speech is recognized. The impact on both MFCC and PLP features is investigated. Standard HMMGMM phoneme-based ASR system with no grammar is used for collection of the all the correct and wrong decodings...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملOptimization and evaluation of Gabor feature sets for ASR
In order to enhance automatic speech recognition performance in adverse conditions, Gabor features motivated by physiological measurements in the primary auditory cortex were optimized and evaluated. In the Aurora 2 experimental setup such localized, spectro-temporal filters combined with a Tandem system yield robust performance with a feature set size of 30. Improved results can be obtained wh...
متن کاملImproved phoneme recognition by integrating evidence from spectro-temporal and cepstral features
Gabor features have been proposed for extracting spectro-temporal modulation information, and yielding significant improvements in recognition performance. In this paper, we propose the integration of Gabor posteriors with MFCC posteriors, yielding a relative improvement of 14.3% over an MFCC Tandem system. We analyze for different types of acoustic units the complementarity between Gabor featu...
متن کاملInformative spectro-temporal bottleneck features for noise-robust speech recognition
Spectro-temporal Gabor features based on auditory knowledge have improved word accuracy for automatic speech recognition in the presence of noise. In previous work, we generated robust spectro-temporal features that incorporated the power normalized cepstral coefficient (PNCC) algorithm. The corresponding power normalized spectrum (PNS) is then processed by many Gabor filters, yielding a high d...
متن کامل